41 research outputs found

    Large scale crowdsourcing and characterization of Twitter abusive behavior

    Full text link
    In recent years online social networks have suffered an increase in sexism, racism, and other types of aggressive and cyberbullying behavior, often manifesting itself through offensive, abusive, or hateful language. Past scientific work focused on studying these forms of abusive activity in popular online social networks, such as Facebook and Twitter. Building on such work, we present an eight month study of the various forms of abusive behavior on Twitter, in a holistic fashion. Departing from past work, we examine a wide variety of labeling schemes, which cover different forms of abusive behavior. We propose an incremental and iterative methodology that leverages the power of crowdsourcing to annotate a large collection of tweets with a set of abuse-related labels.By applying our methodology and performing statistical analysis for label merging or elimination, we identify a reduced but robust set of labels to characterize abuse-related tweets. Finally, we offer a characterization of our annotated dataset of 80 thousand tweets, which we make publicly available for further scientific exploration.Accepted manuscrip

    Simplifying network testing: Techniques and approaches towards automating and simplifying the testing process

    No full text
    The dramatic increase of companies and consumers that heavily depend on networks mandates the creation of reliable network devices. Such reliability can be achieved by testing both the conformance of individual protocols of an implementation to their corresponding specifications and the interaction between different protocols. With the increase of computer power and the advances in network testing research, one would expect that efficient approaches for testing network implementations would be available. However, such approaches are not available due to reasons like the complexity of network protocols, the need for different protocols to interoperate, the limited information on implementation because of proprietary codes, and the potentially unbounded size of the network to be tested. To address these issues, a novel technique is proposed that improves the quality of the test while reducing the time and effort network testing requires. The proposed approach achieves these goals, by automating the process of creating models to be used for validating an implementation. More precisely, it utilizes observations acquired by monitoring the behavior of the implementation for the automatic generation of models. In this way, generated models can accurately represent the actual implementation. Thus, testing is reduced to the problem of verifying that certain properties hold on the generated model. This work presents algorithms that efficiently create models from observations and shows their effectiveness through the presentation of three different examples. In addition, the difficulty of validating models using theorem provers is addressed. To address this issue, techniques available in the literature are utilized and approaches that assist testers with completing proofs are proposed. Results suggest that the complexity of making proofs using theorem proving can be reduced when models are members of the same class, i.e., their structure can be predicted. A final problem this work addresses is that of scale, i.e., the impracticality or even impossibility of testing every possible network configuration. To address this problem, the concept of "self-similarity" is introduced. A self-similar network has the property that can be sufficiently represented by a smaller network. Thus, proving the correctness of a smaller network is sufficient for proving the correctness of any self-similar network that can be represented by this smaller one

    SmartCoding: An online platform for estimating political parties' policy positions

    No full text
    We present a platform to estimate parties' (or candidates') positions in an 'iterative expert survey' approach based on the Delphi method commonly used in forecasting. In terms of architecture, our challenge was to build a web-based system that allows handling of the estimates provided by a panel of expert coders in a distributed and asynchronous manner using the principles of anonymity, iteration, and statistical aggregation. We describe the system built for recording and presenting all the relevant information to the coders over multiple rounds with feedback from each round to subsequent ones, as well as a module for identifying consensus among coders. Finally, we discuss improvements that can be implemented to adapt the platform in cross-national research as well as estimation problems outside the field of application illustrated in this paper

    A view behind the scene: data structures and software architecture of a VAA

    No full text
    In today's rapidly evolving and growing online community many different applications are proposed and implemented. One category of such applications that drew high attention during the last few years are the so-called Voting Advice Applications (VAAs). VAAs are online systems used during elections that allow voters to create a political profile, the comparison of this profile with the profiles of political parties and candidates provides the voter with an estimation of his/her proximity to parties and candidates. In this paper, the data structure and the software architecture used for implementing a VAA platform along with its technical requirements are presented. Furthermore, a novel approach used for supporting multi-lingual content is described

    Using location information for sophisticated emergency call management

    No full text
    It is widely accepted that the faster the response to an incident involving injuries, the higher the probability that lives are saved. Thus, any kind of system that improves the response of the emergency services is expected to be highly beneficial. Improved network connectivity facilities and powerful mobile devices allow the development of smart applications that exploit features such as geographical location identification and Voice over IP. In this paper, we see how we can utilize caller location information to apply policies that enhance emergency call management at both the call originating network and the emergency service call centre, the ultimate aim is to reduce emergency services response times. 2011 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering

    Feature extraction for tweet classification: do the humans perform better?

    No full text
    Sentiment analysis of Twitter data became a research trend the last decade. Thanks to the Twitter API, massive amounts of tweets, relating to a topic of interest, can be collected in real time. Performing sentiment analysis of these tweets can be used to conduct social sensing and opinion mining. For instance, forecasting elections is a primary area in which sentiment analysis of tweets has been extensively applied the last few years. Sentiment analysis of Twitter data presents important challenges compared to the similar task of text classification. Tweets are limited to 140 characters; thus, the conveyed message is compressed and often context-dependent. The tweets are informal and unstructured, usually lacking grammatical soundness and use of a standard lexicon. On the other hand, tweets are usually annotated by their authors regarding their topic and sentiment with the aid of hashtags and emoticons. Identifying appropriate features for sentiment analysis of tweets remains an open research area since text indexing methods face the sparseness problem while POS tagging methods fail due to the lack of grammatical structure of tweets. Character based features, i.e., n-grams of characters, are currently getting popular because they are language independent. However, their effectiveness remains quite low. In this paper, we argue that tokens used by humans for sentiment analysis of tweets are probably the best feature set one can use for that purpose. We compare several automatically extracted features with the features (tokens) used by humans for tweet classification, under a machine learning framework. The results show that the manually indicated tokens combined with a Decision Tree classifier outperform any other feature set-classification algorithm combination. The manually annotated dataset that was used in our experiments is publicly available for anyone who wishes to use it

    Opinion mining from social media short texts: Does collective intelligence beat deep learning?

    Get PDF
    The era of big data has, among others, three characteristics: the huge amounts of data created every day and in every form by everyday people, artificial intelligence tools to mine information from those data and effective algorithms that allow this data mining in real or close to real time. On the other hand, opinion mining in social media is nowadays an important parameter of social media marketing. Digital media giants such as Google and Facebook developed and employed their own tools for that purpose. These tools are based on publicly available software libraries and tools such as Word2Vec (or Doc2Vec) and fasttext, which emphasize topic modeling and extract low-level features using deep learning approaches. So far, researchers have focused their efforts on opinion mining and especially on sentiment analysis of tweets. This trend reflects the availability of the Twitter API that simplifies automatic data (tweet) collection and testing of the proposed algorithms in real situations. However, if we are really interested in realistic opinion mining we should consider mining opinions from social media platforms such as Facebook and Instagram, which are far more popular among everyday people. The basic purpose of this paper is to compare various kinds of low-level features, including those extracted through deep learning, as in fasttext and Doc2Vec, and keywords suggested by the crowd, called crowd lexicon herein, through a crowdsourcing platform. The application target is sentiment analysis of tweets and Facebook comments on commercial products. We also compare several machine learning methods for the creation of sentiment analysis models and conclude that, even in the era of big data, allowing people to annotate (a small portion of) data would allow effective artificial intelligence tools to be developed using the learning by example paradigm

    Applying scheduling policies to improve qoe in wireless voice-over-ip

    No full text
    In the increasingly popular Voice over IP (VoIP) application domain, a well known issue relates to the management of available network bandwidth which is usually constrained by uplink capacities as well as access point limitations in the case of wireless connections. An approach that has long been advocated for controlling the admission of new connections based on bandwidth availability is Call Admission Control (CAC). The number of lines available on an intermediary gateway for interconnecting VoIP calls to the Public Switched Telephone Network (PSTN) is also a finite resource but is in many cases considered for granted. To enhance existing CAC schemes, in this paper we propose a scheme that concentrates on managing the availability of such lines by applying various call scheduling policies. The need for line management is not always apparent but it is crucial in wireless mesh networks, and multi-hop user connectivity in general, where the number of users can vary significantly through time. The aim is to allow fairer access to the lines available; the better management of such lines improves Quality of Experience especially for nomadic users who experience opportunistic and intermittent wireless connectivity

    Creating a maritime wireless mesh infrastructure for real-time applications

    No full text
    Advances in computing and networking technologies have revolutionised the way people communicate using interactive real-time applications such as instant messaging and audio and videoconferencing. This category of applications is nowadays readily available to users in terrestrial areas with high speed Internet connectivity. The aim of this work is to investigate the provision of IP-based network connectivity onboard sailing vessels in order to support real-time communication applications. More precisely, instead of the expensive and high-latency satellite communications solutions widely deployed in the maritime industry, we propose the use of long-range wireless networking technologies for creating a vessel-to-shore mesh network that can be used to form an infrastructure for the provision of telephony services based on Voice-over-IP (VoIP)

    Experimental method for testing networks

    No full text
    Proceedings of the 2005 International Conference on Software Engineering Research and Practice, SERP'05 Volume 2, 2005, Pages 935-941To address problems of scale and validity in testing networks, we propose application of experimental method in the testing process. This means that testing will be an iterative process of running tests, developing models of the network, examining properties of the models to make predictions about the network, and validating (or invalidating) the predictions with further tests. We also propose the use of self-similar structures in networks to reduce the scale of the test effort. Identification of self-similar structures in a network allows the tester to understand the behavior of large networks by experimenting on smaller networks
    corecore